What is the Population of Interest: Population Modeling for BayesDB
نویسنده
چکیده
BayesDB [1, 2] is a probabilistic programming platform that enables users to solve probabilistic data analysis problems using a simple, SQL-like language. Queries execute against generative population models (GPMs), a new abstraction that can be used to integrate data, metadata, qualitative domain knowledge, and quantitative models. Baseline quantitative models are typically built via an AI modeling assistant then refined by end users. A key challenge in using BayesDB is designing the template, or population schema, that defines the conceptual population of interest. The issues in population design are broadly analogous to relational database schema design, with additional concerns for statistical validity, modeling, and inference queries. Basic population modeling starts with considerations like:
منابع مشابه
BayesDB: A probabilistic programming system for querying the probable implications of data
Is it possible to make statistical inference broadly accessible to non-statisticians without sacrificing mathematical rigor or inference quality? This paper describes BayesDB, a probabilistic programming platform that aims to enable users to query the probable implications of their data as directly as SQL databases enable them to query the data itself. This paper focuses on four aspects of Baye...
متن کاملCut-off Sampling Design: Take all, Take Some, and Take None
Extended Abstract. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen. To draw a sample from the underlying population, a variety of sampling methods can be employed, individually or in combination. Cut-off sampling is a pr...
متن کاملBayesDB: Querying the Probable Implications
BayesDB, a Bayesian database table, lets users query the probable implications of their tabular data as easily as an SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with little statistics knowledge can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probabl...
متن کاملHorvitz-Thompson estimator of population mean under inverse sampling designs
Inverse sampling design is generally considered to be appropriate technique when the population is divided into two subpopulations, one of which contains only few units. In this paper, we derive the Horvitz-Thompson estimator for the population mean under inverse sampling designs, where subpopulation sizes are known. We then introduce an alternative unbiased estimator, corresponding to post-st...
متن کاملEstimation of the Active Network Size of Kermanian Males
Background: Estimation of the size of hidden and hard-to-reach sub-populations, such as drug-abusers, is a very important but difficult task. Network scale up (NSU) is one of the indirect size estimation techniques, which relies on the frequency of people belonging to a sub-population of interest among the social network of a random sample of the general population. In this study, we estimated ...
متن کامل